Add model for Sesame TTS by lucasnewman · Pull Request #36 · Blaizzy/mlx-audio

lucasnewman · 2025-03-15T18:29:26Z

Support for the Sesame TTS model, based on the official implementation and pre-trained model here.

Example usage:
python -m mlx_audio.tts.generate --model lucasnewman/csm-1b-mlx --play --text "Hello from Sesame."

TODO:

I'll save quantization support as a follow-up since it's not really my area of expertise.

Blaizzy · 2025-03-15T18:56:43Z

Great job @lucasnewman, I love the speed! 🚀

The mimi codec will be really useful for some models on our roadmap.

FYI, there is an existing repo you can take inspiration from:

https://github.com/senstella/csm-mlx

I will help out when I return from vacation this coming week.

lucasnewman · 2025-03-16T00:08:39Z

Thanks for the reference! Basic audio gen is working now -- here's a sample.

lucasnewman · 2025-03-16T03:56:35Z

@Blaizzy I don't know if you want to use the (unquantized) model I uploaded to HF or another repo -- it's up to you! This is what the output looks like:

Model: lucasnewman/csm-1b-mlx
Text: Hello from Sesame.
Voice: af_heart
Speed: 1.0x
Language: a
==========
Audio generated successfully, saving to audio!
==========
Duration:              00:00:01.280
Samples/sec:           18930.8
Prompt:                33 tokens, 20.3 tokens-per-sec
Audio:                 30720 samples, 18930.8 samples-per-sec
Real-time factor:      1.27x
Processing time:       1.62s

The voice, speed, & language aren't applicable here but I was trying to be as surgical as possible with the model loading / generate changes. Feel free to change it up to whatever you'd like.

mlx_audio/tts/utils.py

Blaizzy · 2025-03-16T23:36:19Z

This is phenomenal @lucasnewman, you crushed it! 🔥

I will review and merge tomorrow. As well as, handle the quantization.

@Blaizzy I don't know if you want to use the (unquantized) model I uploaded to HF or another repo -- it's up to you! This is what the output looks like:

Could you upload your copy to mlx-community with the name:
mlx-community/csm-1b-bf16

and update path utils.py

The voice, speed, & language aren't applicable here but I was trying to be as surgical as possible with the model loading / generate changes. Feel free to change it up to whatever you'd like.

I'm thinking about a general API design. For instance, in my view ref_audio == voice, just that depending on the model it will alternate between text and a path to an audio. But I will save refactoring to v0.1.0 when we get STS up and running.

lucasnewman · 2025-03-17T03:17:58Z

Could you upload your copy to mlx-community with the name: mlx-community/csm-1b-bf16

The base model is fp32, not bf16, so I'll put it at mlx-community/csm-1b 👍

lucasnewman · 2025-03-17T03:20:16Z

I'm thinking about a general API design. For instance, in my view ref_audio == voice, just that depending on the model it will alternate between text and a path to an audio. But I will save refactoring to v0.1.0 when we get STS up and running.

Yep, that makes sense. We'll need some kind of voice_caption / voice_text parameter for Sesame and models like F5-TTS, since they require the caption alongside the reference audio.

Blaizzy · 2025-03-17T09:12:54Z

Could you upload your copy to mlx-community with the name: mlx-community/csm-1b-bf16

The base model is fp32, not bf16, so I'll put it at mlx-community/csm-1b 👍

Sure, that makes sense.

I'm used to converting it to bf16.

Blaizzy · 2025-03-17T09:49:26Z

I'm thinking about a general API design. For instance, in my view ref_audio == voice, just that depending on the model it will alternate between text and a path to an audio. But I will save refactoring to v0.1.0 when we get STS up and running.

Yep, that makes sense. We'll need some kind of voice_caption / voice_text parameter for Sesame and models like F5-TTS, since they require the caption alongside the reference audio.

Got it, let me check a few things and come back with some suggestions

Blaizzy · 2025-03-17T22:14:39Z

Merged! 🚀

lucasnewman added 6 commits March 14, 2025 15:08

Add Mimi codec.

ece126c

Formatting.

4be9640

Checkpoint, get architecture in place and generation loop working.

f70d8b6

Add kv cache.

dba6c81

Load weights.

2c3acc1

Formatting.

c4f023a

Get audio generation working.

0203bf8

lucasnewman and others added 2 commits March 15, 2025 18:20

Merge branch 'Blaizzy:main' into sesame

e4b9b3c

Add watermarking.

d0f5e4e

senstella mentioned this pull request Mar 16, 2025

Amazing! senstella/csm-mlx#1

Open

lucasnewman added 2 commits March 15, 2025 20:00

Add pretrained model loading.

27506e9

Integrate with CLI generation.

08e3f77

lucasnewman marked this pull request as ready for review March 16, 2025 03:54

lucasnewman changed the title ~~[WIP] Add model for Sesame TTS~~ Add model for Sesame TTS Mar 16, 2025

Add support for voice matching.

e4baf0c

Blaizzy reviewed Mar 16, 2025

View reviewed changes

mlx_audio/tts/utils.py Outdated Show resolved Hide resolved

lucasnewman added 3 commits March 17, 2025 14:29

Use a nn.Module as the top-level class.

f0380d3

Revert previous change.

59385cc

Whitespace.

b434099

Blaizzy approved these changes Mar 17, 2025

View reviewed changes

Blaizzy merged commit 267d61e into Blaizzy:main Mar 17, 2025
1 check passed

lucasnewman deleted the sesame branch March 19, 2025 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add model for Sesame TTS#36

Add model for Sesame TTS#36
Blaizzy merged 15 commits intoBlaizzy:mainfrom
lucasnewman:sesame

lucasnewman commented Mar 15, 2025 •

edited

Loading

Uh oh!

Blaizzy commented Mar 15, 2025 •

edited

Loading

Uh oh!

lucasnewman commented Mar 16, 2025

Uh oh!

lucasnewman commented Mar 16, 2025

Uh oh!

Uh oh!

Blaizzy commented Mar 16, 2025 •

edited

Loading

Uh oh!

lucasnewman commented Mar 17, 2025

Uh oh!

lucasnewman commented Mar 17, 2025

Uh oh!

Blaizzy commented Mar 17, 2025

Uh oh!

Blaizzy commented Mar 17, 2025

Uh oh!

Uh oh!

Blaizzy commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lucasnewman commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaizzy commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucasnewman commented Mar 16, 2025

Uh oh!

lucasnewman commented Mar 16, 2025

Uh oh!

Uh oh!

Blaizzy commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucasnewman commented Mar 17, 2025

Uh oh!

lucasnewman commented Mar 17, 2025

Uh oh!

Blaizzy commented Mar 17, 2025

Uh oh!

Blaizzy commented Mar 17, 2025

Uh oh!

Uh oh!

Blaizzy commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lucasnewman commented Mar 15, 2025 •

edited

Loading

Blaizzy commented Mar 15, 2025 •

edited

Loading

Blaizzy commented Mar 16, 2025 •

edited

Loading